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This paper gives a theoretical treatment of several properties which de- 
scribe certain variable-length binary encodings of the sort which could be 
used for the storage or transmission of information. Some of these, such as 
the prefix and finite delay properties, deal with the time delay with which 
circuits can be built to decipher the encodings. The self-synchronizing prop- 
erty deals with the ability of the deciphering circuits to get in phase 
automatically with the enciphering circuits. Exhaustive encodings have the 
property that all possible sequences of binary digits can occur as messages. 
Alphabetical-order encodings are those for which the alphabetical order of 
the letters is preserved as the numerical order of the binary codes, and woidd 
be of possible value for sorting of data or considtalion of files or dictionaries. 
Various theorems are proved about the relationships between these properties, 
and also about their relationship to the average number of binary digits used 
to encode each letter of the original message. 

I. INTRODUCTION 

Table I gives three different encodings for representing the letters of 
the alphabet and the space symbol in binary form. These encodings 
have several special properties which are of some interest. First, each 
is a variable-length encoding; that is, the code for each letter is a sequence 
of binary digits, but the codes assigned to different letters are not all 
required to consist of the same number of binary digits. The first two 
of these encodings have the prefix property; that is, no one of the codes 
is a prefix of any other code of the same encoding. This property makes 
it easy to decipher a message, since it is only necessary to look at enough 
binary digits of the message until it agrees with one of the codes if it is 
desired to find the first letter of the deciphered message. 

The first of these encodings, called the Huffman encoding, is con- 
structed by the method given by Huffman, 1 and has the property of 
being a minimum-redundancy encoding; that is, among all variable- 
length binary encodings having the prefix property, this is an encoding 
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Table I 



Letter 


Probability 


Huffman Code 


Alphabetical Code 


Special Code 


Space 


0.1859 


000 


00 


00 


A 


0.0642 


0100 


0100 


0100 


B 


0.0127 


011111 


010100 


010100 


C 


0.0218 


11111 


010101 


010101 


D 


0.0317 


01011 


01011 


01011 


E 


0.1031 


101 


0110 


0110 


F 


0.0208 


001100 


011100 


011100 


G 


0.0152 


011101 


011101 


011101 


H 


0.0467 


1110 


01111 


01111 


I 


0.0575 


1000 


1000 


1000 


J 


0.0008 


0111001110 


1001000 


10001111111 


K 


0.0049 


01110010 


1001001 


100100 


L 


0.0321 


01010 


100101 


100101 


M 


0.0198 


001101 


10011 


10011 


N 


0.0574 


1001 


1010 


1010 


O 


0.0632 


0110 


1011 


1011 


P 


0.0152 


011110 


110000 


11000 


Q 


0.0008 


0111001101 


110001 


110001111111 


R 


0.0484 


1101 


11001 


11001 


s 


0.0514 


1100 


1101 


1101 


T 


0.0796 


0010 


1110 


1110 


U 


0.0228 


11110 


111100 


111100 


V 


0.0083 


0111000 


111101 


111101 


w 


0.0175 


001110 


111110 


111110 


X 


0.0013 


0111001100 


1111110 


1111101111111 


Y 


0.0164 


001111 


11111110 


1111110 


Z 


0.0005 


0111001111 


11111111 


11111101111111 


Cost 




4.1195 


4.1978 





having the lowest possible cost (where the cost is defined as the average 
number of binary digits used per letter of the original message, assuming 
that the message is made up of letters independently chosen, each with 
the probability given). 

The second of these encodings, called the alphabetical encoding, has 
the property that the alphabetical order of the letters corresponds to 
the numerical binary order of the codes. Among all such alphabetical- 
order-preserving binary encodings that are of variable length and have 
the prefix property, the one given has been constructed to have the 
lowest possible cost. It can be seen that the cost 4.1978 of the alpha- 
betical encoding is quite close to the cost 4.1195 of the Huffman encoding, 
as compared to the cost 5 of the more conventional fixed-length encoding 
for the same alphabet, so that the alphabetical restriction adds surpris- 
ingly little expense to a variable-length encoding. 

Part of this paper deals with the methods of constructing such best 
alphabetical encodings, and gives some theorems concerning their cost 
and their structure. However, this paper also includes theoretical results 
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about various properties of variable-length binary encodings in general. 
The cost, the prefix property and unique decipherability have already 
been mentioned. The exhaustive property (roughly speaking, this 
permits all infinite binary sequences to occur as encoded messages) is 
also shown to be relevant, as is the finite delay property, which has to do 
with the amount of delay which must take place between receiving and 
deciphering the enciphered message. Various theorems are proved con- 
cerning the relationships of these properties to each other and to other 
properties. Some of these properties have also been considered by other 
authors. 1 • 2 ' 3 ' 4 - 5 ' 6 

One property of special interest is the ability of certain variable-length 
encodings (but not of fixed-length encodings) to automatically syn- 
chronize the deciphering circuit with the enciphering circuit. This self- 
synchronizing property, while it has been previously mentioned, 5 is a 
little-known property which might have practical significance in that it 
would permit binary deciphering machines using variable-length encod- 
ings to be built without requiring any special synchronizing circuits or 
synchronizing pulses, such as are needed for fixed-length encodings. 
Thus, there may be cases where (despite some present opinions to the 
contrary) variable-length encodings lend themselves to simpler instru- 
mentation than fixed-length encodings. 

Since the probabilities given in Table I are derived from one of the 
tables of frequencies of letters in English text, 7 the encoding given 
should be reasonably efficient for encoding English words or phrases. 
The alphabetical property, together with the prefix property, implies 
that two such words or phrases could be compared for alphabetical 
order merely by putting the two entire phrases into a simple comparison 
circuit of the kind which would be used to compare binary numbers. If 
the two phrases begin with the same sequence of letters, the correspond- 
ing parts of their enciphered form would agree, and the outcome of the 
binary comparison would be determined by the comparison between 
the two binary codes corresponding to the first pair of letters which 
disagree. 

Placing the space symbol before the letter A of the alphabet corre- 
sponds to the usual convention governing the filing of multiple-word 
entries in alphabetical order, although if it were desired also to include 
punctuation marks or numerals in the alphabet, the conventions are not 
so universal, and might not be of the sort which can easily be expressed 
in a binary encoding. 

An alphabetical encoding might be used as a means of saving memory 
space needed for names or other alphabetical data that are to be sorted 
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into alphabetical order on a data-processing machine or are to be stored 
in a file in alphabetical order. Similarly, it might be used for the words of 
a dictionary as a part of a language-translating machine, if it were 
desired to preserve the conventional alphabetical order of dictionaries. 
In addition to possible savings of memory space, it might be used to find 
entries in such a dictionary more quickly. Since the low redundancy of 
this encoding causes the digits and 1 to be used with more nearly equal 
frequency and more nearly independently than in a fixed-length encod- 
ing, the binary numerical value associated with each word would increase 
more nearly as a linear function of distance progressed through the 
dictionary; hence, instead of searching for a given word by the method 
of successively halving the interval in which it is known to lie, linear 
interpolation (or some rough approximation to it which might be done 
by a simpler circuit) could be used to speed up convergence. However, 
for uses such as mentioned here, the particular alphabetical encoding 
given in Table I is not necessarily the optimum, since the frequencies of 
occurrence of letters in names or in dictionary entries are undoubtedly 
different than they are in connected English text. However, the methods 
given in this paper would enable such an encoding to be obtained for 
any given probability distribution. 

II. TERMINOLOGY 

We will use the word letter to refer to any symbol of some designated 
list, including even the space symbol of Table I. By an alphabet we will 
mean a set of letters. We will usually require each member of an alphabet 
to have associated with it a probability of occurrence, and we will also 
usually require that some linear ordering relationship (which we will 
call alphabetical order) be defined for the letters of this alphabet. So that 
we may call any subset of the letters of an alphabet a subalphabet, and 
may keep the same ordering and the same probabilities, we will require 
only that the sum of the probabilities be less than or equal to one. All 
of the alphabets considered in this paper have only a finite number n 
of letters, but it might be advisable to allow countably infinite alphabets 
in certain further theoretical extensions of this subject. 

A message is a finite sequence of letters, or an infinite sequence LiL 2 L 3 
■ • • which extends infinitely only into the future, not into the past. We 
will consider a source which generates messages in which successive 
letters occur independently and with the given probabilities. However, 
in case the sum of the probabilities is less than one, we may imagine that 
the probabilities are proportionately increased just enough that their 
sum becomes one, so that the associated source is more realistic. 



VARIABLE-LENGTH BINARY ENCODINGS 937 

We distinguish between code and encoding, both of which are often 
called codes by other writers. A code is a finite sequence of binary digits. 
An encoding is a way of associating (or more formally, a function C 
which associates) a code d with each letter L, of an alphabet. 

The operation of enciphering (elsewhere often called encoding) con- 
structs a sequence of binary digits which is made up of the code for the 
first letter of the message, followed immediately by the code for the 
second letter of the message, etc. Any message then produces a sequence 
of binary digits called the enciphered message. Any machine or circuit 
which does the operation of enciphering is called an enciphering machine 
or an enciphering circuit. The enciphered message of a finite message is 
obviously always finite. 

An encoding will be said to be uniquely decipherable if, for each finite 
enciphered message, there exists exactly one original message which 
could have produced it. If an encoding is uniquely decipherable, then 
there is obviously a procedure for deciphering any finite enciphered 
message (by enumeration, for instance), and any machine or circuit 
capable of doing this will be called a deciphering machine or a deciphering 
circuit. 

Following Huffman, 1 we define a prefix of any sequence <t» of binary 
digits to be any finite sequence which is either $ itself or is obtainable 
by deleting all of the digits after a given point of <p. For example, the 
prefixes of 10110 are 10110, 1011, 101, 10, 1, and the null sequence, 
which has no digits. We will say that an encoding C has the prefix 
property if no code of C is a prefix of any other code of C. 

By a presumed message we will mean a finite or infinite sequence <p 
of binary digits such that every prefix of $ is a prefix of the enciphered 
form of some message. Then, at any given time while a presumed message 
is being sent into a deciphering machine, it is indistinguishable from a 
message, so it makes sense to allow presumed messages as well as mes- 
sages to be the class of sequences which can be sent into a deciphering 
machine. 

III. THE ENCODING THEOREM FOR ALPHABETICAL ENCODINGS 

Consider a discrete source »S which uses the alphabet: space, A, B, 
• • • , Z (any other linearly ordered alphabet will also serve). An encoding 
of blocks of N letters into binary sequences will be called an alphabetical 
encoding if it is uniquely decipherable and the codes for the blocks in 
alphabetical (dictionary) order are themselves in numerical order. Here 
the codes are imagined to be prefixed by binary points to convert them 
into numbers in binary form. The alphabetical encoding of Table I is a 
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case with N = 1. It is a natural question to ask if a restriction to alpha- 
betical encodings may not be severe for some sources S. In particular, 
are the results of Shannon's encoding theorem (Ref. 8, Theorem 9) 
still obtainable with alphabetical encodings? 

Shannon proved that the output of a discrete source having entropy 
H bits per character can be enciphered in a uniquely decipherable manner 
into a sequence of binary digits so that the average number of digits 
used per character exceeds H by an arbitrarily small amount. Shannon's 
construction encodes blocks of N source characters into binary sequences, 
using a cost (average number of binary digits per character) H N which 
satisfies 

NG N ^ NH' N ^ NG N +1. (1) 

Here, NG N is Shannon's notation for the information contained in a 
block of N characters produced by the source; i.e., 

NG N = - E pi log V i , ( 2 ) 

in which the p, are the JV-gram probabilities of the source. Then, since 

lim G N = H, 



Shannon's theorem 



JV-*> 



lim H' N = H (3) 



N—*> 



follows from (1). Since NG N must be a lower bound on the average 
number of digits used to encode a block of N characters by any means 
whatever, (1) shows that Shannon's construction is not far from the 
best possible one for block encoding. We now give a similar theorem 
for alphabetical encoding. 

Theorem 1 : Let S be a source producing messages which may be ordered 
{alphabetically). Let G N be computed from the N-gram probabilities p, of 
S by (2). There exists a uniquely decipherable alphabetical encoding of 
blocks of N characters of S into sequences of binary digits for which the 
cost, H N , satisfies 

NG„ ;g NH N ^ NG N + 2. (4) 

By picking N large enough, H N may be made arbitrarily close to the entropy 
H of S in bits per character. 

Proof: The proof is adapted from Shannon's 8 proof of his Theorem 9. 
Let all possible blocks of iV source characters be listed in alphabetical 
order, and let p, denote the probability of the ith block in the list (recall 
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that Shannon lists his blocks in order of probability rather than alpha- 
betically). Let mi be the integer for which 

2-'"' : ^ pi < 2 1 -""-. 

Also, let numbers A\ , At , A 3 , • • ■ be defined by 

Ai = ( Pl + ••■ +Pi-i) +|- 



Note that 



< Ai ^ A 2 ^ • • • ^ f . 



We now construct an alphabetical encoding. The code for the ith block 
will be the first ra, + 1 digits of the binary expansion of the number Ai . 
In Shannon's encoding this same block has a code formed by expanding 
a (different) number to m, places. Then our scheme uses only one more 
digit than docs Shannon's for each block, NH N = NH N + 1, and (4) 
follows from (1). It remains now to show that our encoding is uniquely 
decipherable; i.e., that the sequence of letters generated by S may be 
reconstructed from the binary digits. 

It suffices to prove that our construction produces a list of codes which 
have the prefix property. Then the enciphered message produced by 
each block of A r letters may be deciphered as soon as all its digits have 
been received. 

To prove that our list has the prefix property, consider any two blocks 
of letters, say the ith and thejth with i < i. By (5), 

Aj ^ Ai . -\- — + — , 

and 

A s ^ Ai+ 2" 1 "'"' + 2" 1 "**. (6) 

If v> = Vi » tuen m < = m j ! but, by (6), the ,/th code cannot be identically 
the same as the first 1 + nij places of the ith. code. Similarly, if p, ^ pj 
the ith code cannot be a prefix of the jth code. Thus, the prefix property, 
and the theorem, follow. 
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Except in the case of an alphabet having only one letter, the prefix 
property is sufficient to insure unique decipherability, but it is not 
necessary. For example, the list 0, 01, 11 does not have the prefix prop- 
erty; still it could be used. In a received message 00001111 • • • there 
would be no doubt about the first three 0's, and the fourth would be 
recognized as 01 or not according to whether an odd or even number of 
l's followed it. 

However, by a best alphabetical encoding we will mean an encoding 
which has the lowest cost among all alphabetical encodings which have 
the prefix property. This insistence upon the prefix property will make 
it possible for us to prove Theorems 2 through 5 and give constructive 
methods for finding these best alphabetical encodings. 

If we use the construction just described to design an alphabetical 
encoding of English with N = 1, we obtain a cost of 5.75 digits per 
character. As guaranteed by the theorem, this cost is less than (?i + 
2 = 6.08. However, we could have done better by simply assigning a 
five-digit code to each letter. The encoding can be much improved by 

Table II 



Letter 


Code 


Shortened Code 


Space 
A 
B 
C 


(1001 
00110 
01000001 
0100011 


000 

001 

010000 

010001 



deleting some digits which are obviously not needed. For example, the 
first few codes are those listed in Table II. Clearly the code 00110 for A 
is too long. As soon as the prefix 001 is received, A is the only possibility. 
The final digits 10 may be deleted. Similarly, the other codes may be 
shortened, as indicated in Table II, until no code can lose a final digit 
without becoming a prefix for some other code. The cost is thereby 
reduced to 4.44 digits per character. 

A different encoding is obtainable using the same sort of construction 
but with 



Ai = E2^' + 2 



-IB,— 1 



1-1 



The same proof can be used, since (6) still holds. Since the code lengths 
are again the numbers m,- -f- 1, the new encoding will have the same cost. 
The numbers A, can now be computed with ease directly in the binary 
system, and much of the arithmetic needed for the first construction 
may be avoided. However, the kind of shortening used in Table II does 
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not work us well with the new encoding. All codes (as numbers) are now 
less than 

E 2-'. 

1 

This number need not be near 1 (typically it is about f). The codes are 
then cramped together in a range smaller than (0,1) and cannot be 
shortened as much. For the case of the English source with N = 1, the 
new encoding can only be shortened to cost 5.02 digits per letter. 

IV. ENCODING THICKS 

The simple construction just given does not produce the best encoding, 
i.e., the one with least cost. The best encoding can always be found by 
a systematic, although long, calculation which is described in the next 
section. Here we list a few tricks whereby the problem of finding the 
best encoding may be simplified and, in some cases, solved. 

We will describe these results in terms of encoding single letters into 
binary form; however, it is to be understood that blocks of iV letters 
may always be considered the single letters of a larger alphabet. By a 
prefix set of an encoding we will mean the set of all letters which have 
codes beginning with a given prefix. For example, in the Huffman encod- 
ing of Table I the prefix Oil has the prefix set consisting of letters B, G, 
J, K, P, Q, V, X and Z. In an alphabetical encoding every prefix set 
must consist of all letters lying between some two fixed letters in the 
alphabet. 

The tricks to be described enable one to prove that certain collections 
of letters must be prefix sets in any best alphabetical encoding. Whenever 
a prefix set is known the encoding problem can then be reduced as follows 
to one for a smaller alphabet. 

Theorem 2: In a best alphabetical encoding let S be a prefix set for a 
prefix ir. Construct a shorter alphabet by replacing the letters of S by a 
single new letter, L\ occupying their place in alphabetical order ami having 
as its probability the sum of their probabilities. A best encoding of the new 
alphabet gives U the code -k and gives every other letter its old code. 

Proof: Let C(L) denote the code for letter L in the original best 
encoding. Suppose, contrary to the theorem, that the new problem had 
a better solution in which L, D had codes C l (L) and C l (L l ). One would 
then obtain a better solution of the original problem by encoding L into 
C l (L). The code for a letter M in the prefix class would be C{M) with 
the prefix ir changed to C l (L l ). 

Huffman's encoding scheme uses a result similar to Theorem II for 
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non alphabetical encodings. The two letters of lowest probability must 
form a prefix set, and his result is used again and again, until there are 
only two letters left and the problem is solved. When the encoding must 
be alphabetical one cannot always find a prefix set easily. Some results 
in this direction are given by the following theorems. The symbols 
Li , L-i , • • • are used to represent the letters of the alphabet in order; 
Pi , Pi , ■ • • will be their probabilities; C(L\), C(L 2 ), • • • will be their 
codes in the encoding C and Ni , N 2 , • • • will be the numbers of binary 
digits in their codes. Also, if $ is any code or any prefix, N($>) will be 
used to represent the number of binary digits in $. 

An encoding will be said to be exhaustive if it encodes an alphabet of 
two or more letters in a uniquely decipherable manner and, for every 
infinite sequence x = X1X&3 • • • of binary digits, there is some message 
which can be enciphered as x; or if it encodes an alphabet of one letter 
by using the null sequence. 

Theorem 3: Every best alphabetical encoding is exhaustive. 

Pi-oof: Consider an encoding of an alphabet having two or more 
letters which is alphabetical and has the prefix property, but is not 
exhaustive. It will be shown that it is not a best encoding. Let x be an 
infinite sequence of binary digits such that no message can be encoded 
as x. If any code of the encoding is a prefix of x, remove it from x, and, 
after a finite number of repetitions of this process, an x will be obtained 
which has no one of the codes for a prefix. Let <£ be the greatest prefix of 
x which is also a prefix of any one of the codes. Let (7, be some code of 
which <I> is a prefix. We will use $0 to represent the sequence * followed 
by 0. Then either $0 is a prefix of C, and $1 is a prefix of x, and $1 is 
not a prefix of any code of this encoding; or else 4>1 is a prefix of C, and 
$0 is a prefix of x, and <M) is not a prefix of any code of this encoding. 
Without loss of generality, we assume the second one of these alterna- 
tives. Then consider the new encoding which agrees with the old one 
for all codes not having $ as a prefix, but which has a code $0 in place 
of each code of the form $10. The new encoding has a lower cost than 
the old one, is still alphabetical and still has the prefix property. Hence 
the original encoding was not a best alphabetical encoding. 

Lemma 1 : Let w be a prefix. In a best alphabetical encoding, if there is a 
code with prefix irO there is one with prefix wl. Conversely, if there is a 
code with prefix ir1 , there is one with prefix irO. 

Proof: If x0 is a prefix, then by Theorem 3 the sequence 7rl 11 
must have some code C, as a prefix. But by the prefix property, C, 
cannot be a prefix of irO; hence, C, has prefix rl. The converse is proved 
similarly. 
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Lemma 2: Let L a be the letter of lowest probability. In a best alphabetical 
encoding, L a , together with one of L a+ i or L a -i must form a prefix set. 

Proof: Suppose C(L„) ends in 0, say C(L a ) = ttO, where x stands for 
some prefix. By Lemma L, irl is a prefix of C(L a+ i). If C{L a +i) = irl, 
we have the desired result. If not, t10 must be a prefix of C(L a+ i). 
By Lemma 1 there exist codes with prefix rll. A better encoding (and 
hence a contradiction) may be had by the following changes: Lengthen 
C(La) from ttO to ttOO. Change all codes of the form ttIO^ to ttOI^. Shorten 
all codes of the form ttW^ to ttI^. Since the last change applies to at 
least one letter (of higher probability than L„), there is a net decrease 
in cost. 

The proof in the other case [C(L a ) ending in 1] is similar. If, as is the 
case of the probabilities of Table I, the least probable letter is at the 
end of the alphabet, then this letter has only one neighboring letter and 
must form a prefix set with it. Thus, as a first step in Table I, we can 
write 

C(Y) = tt(Y,Z)0, 

C(Z) = tt(Y,Z)1, 

where tt(Y,Z) is some unknown prefix. Then, using Theorem 2, the prob- 
lem is reduced to an encoding for a 26-letter alphabet in which Y and Z 
have been replaced by a single letter L(Y,Z) of probability 0.01G9. 
When this new problem is solved, tt(Y,Z) will be found as the code for 
L(Y,Z). The new least probable letter is J or Q, both with the same 
probability 0.0008; J, for example, can be in a prefix set with either I 
or K, but Lemma 2 gives no clue for deciding which one. One might 
hope that one can always pick the less probable neighbor, K in this 
case. However, it is easy to find counter-examples which disprove this 
conjecture. A weaker, but true, theorem is the following one. 
Theorem .',: Let L n be the letter of lowest probability. Suppose that 

Pa-n > Pa+ Pa-\ • (7) 

Then L a ami L„_i must form a prefix set in any best alphabetical encoding. 
Similarly, if p„_i > p« + p u+ i , L a and L a+i must form a prefix set. 

Proof: Suppose (7) holds but that L a and L„_i do not form a prefix 
set. Then, by Lemma 2, L a and L a+i form a prefix set. The codes for L„ 
and L u+ i must be of the form 

C(L a ) = xO, 

C(L a+l ) = ttI 
for some prefix r. The code C'(L a _i) must end in 1, say C(L fl _i) = pi. 
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For, if C(L„_i) were pO, Lemma 1 would show that some code has prefix 
pi and hence must stand for a letter between L„_i and L a in the alpha- 
betical order, an impossibility. Lemma 1 now shows that some other 
letters have prefix pO. 

We consider two cases determined by the numbers N(t) and N(p) of 
digits in x and p: 

Case 1 — N(tt) < N(p). An improved encoding can be made by changing 
C{L a ) from tO to 7r01, C{L n -i) from pi to 7r00 and all codes of the form 
pO\f/ to p\p. The last change, a shortening, affects some codes and so off- 
sets the lengthening of the least probable code. 

Case 2 — N(p) ^ N(ir). An improvement can be made by shortening 
C(L a+ i) from xl to x while changing C{L a ) from xO to pll and C{L a -i) 
from pi to plO. That there is a net decrease in cost follows from (7). 

The other half of the theorem is proved in a similar way. 

Applying Theorem 4 to our reduced problem of Table I, we obtain 
further reductions, producing new letters L(J,K) and L(P,Q) with prob- 
abilities 0.0057 and 0.0160. Now the lowest-probability letter has become 
X, and we need another kind of theorem. 

Theorem 5: If Li and Lj (i < j) are two letters both of 'probability ex- 
ceeding pi +l + p, + 2 + . . . + Pi-i , then the intervening letters L,-+i , L, +2 , 
. . . , Lj-iform a prefix set in any best alphabetical encoding. 

Proof: Let x denote the greatest common prefix of C(L,) and C(Lj), 
i.e., a prefix such that x0 is a prefix of C(Li) while xl is a prefix of C(Lj). 
The intervening letters have either x0 or 7rl as prefixes. Supposing that 
there are some intervening letters with prefix x0, we assert that the 
intervening letters with prefix ttO form a prefix set. To prove this assertion, 
let the intervening letters with prefix xO be L 1+1 , . . . L c , where C{L c+i ) 
has prefix xl. Let 7r0p denote the greatest common prefix of C(Li) and 
C(L C ). Then C(L e ) must have prefix 7r0pl; otherwise, by Lemma 1, L c+ i 
would have prefix 7r0p, and hence t0. Also, C(Li) has prefix 7r0p0; other- 
wise, ?r0pl would be a greater common prefix than 7r0p. The assertion re- 
quires only that we prove that C(L i+l ) has prefix xOpl, for then the 
letters in question and no others have this prefix. If, on the contrary, 
C(Lj + i) has prefix 7r0p0, find the greatest common prefix 7rOpO<r such 
that 7r0p0<r0 is a prefix of C(L,) and 7r0p0o-l is a prefix of C(L i+ i). Now 
shorten all codes of the form 7r0p0<r0^ to xOpOo-^ and lengthen all 
other codes irOp\p to TrOpl^. The shortened codes include the one for 
L,- , which has more probability than the total probability of all the 
lengthened codes. The assertion is now proved, and likewise intervening 
letters with prefix irl form a prefix set. 

By our two assertions, each of C(L, + i), . . . , C(Lj-i) has one of two 
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prefixes, which we may call xOpl and 7t1t0, while 7rOpO is a prefix of 

C(Li) and ttItI is a prefix of C(L,). Again, one proves the theorem by 

making changes which put the intervening letters into a single prefix 

set. There are two cases : 

Case 1—N(ir0p) ^ N(tIt). Lengthen codes TrOpl^ to irOplQ\f/. Change 

codes ttItOi/' to xOplli/'. Shorten all codes wlrty to ttIti/'. The intervening 

letters now form a prefix set with prefix xOpl and the new encoding has 

smaller cost. 

Case 2 — N(t1t) ^ N(irOp). By changes similar to those of Case 1, one 

may reduce the cost by making the intervening letters into a prefix set 

with prefix tItO. 

Applying Theorem 5 to Table I, we now recognize new prefix sets and 
reduce the problem by introducing new letters L(F,G) and L(U,V, 
W,X,Y,Z) of probabilities 0.0360 and 0.0668. Now L(J,K) becomes 
the least probable letter, Theorem 4 applies, and we form a new letter 
L(J,K,L) of probability 0.0378. Next, Theorem 4 applies to letter B, and 
we form a new letter L(B,C) of probability 0.0345. Again we are at an 
impasse. 

Theorem 6: If pi < pz , then Li and L 2 form a -prefix set in any best 
alphabetical encoding. Similarly, if L n is the last letter of the alphabet, 
L„_i and L n must form a prefix set if p n < p n -% ■ 

Proof: If pi < p% and Li and L 2 are not a prefix set, then C(Li), C{Li) 
and C{L Z ) may be shown to have the forms irO, irlpO and irlpl^. Then 
one could improve the encoding by changing C(Li) to xOO, C(L 2 ) to 
7r01 and all codes irlpl\f/ to wlpip- 

This theorem provides no further reduction of our example. Note, 
however, that it might have been applied following the creation of 
L(Y,Z) to prove that X,Y,Z, forms a prefix set. This information is 
helpful when we must add the final digits to the prefix 7r(U,V, . . . , Z) 
to form the codes for U, . . . , Z. Using Huffman's encoding method, we 
find, disregarding questions of alphabetical order, the best way of en- 
coding four letters which have probabilities in the same ratio as our 
letters U,V,W and L(X,Y,Z). The solution gives each letter two digits. 
Then, an equally good alphabetical encoding gives these letters the code 
00, 01, 10, 11. We now know parts of the codes sought, as summarized 
in Table III. The unknown prefixes tt(B,C), ... are to be determined 
by finding a best alphabetical encoding of the 17-letter alphabet listed 
in Table IV. 

Again we might try a Huffman encoding for Table IV. However, we 
note in advance that M and L(P,Q) are much less probable than their 
neighbors. Then a Huffman encoding will give these letters such long 
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Table III 



Letter 


Code 


B 


tt(B,C)0 


C 


ir(B,C)l 


F 


7T(F,G)0 


G 


7r(F,G)l 


J 


t(J,K,L)00 


K 


ir(J,K,L)01 


L 


ir(J,K,L)l 


P 


ir(P,Q)0 


Q 


tt(P,Q)1 


U 


tt(U, ---.Z^O 


V 


it(V, •••,Z)01 


w 


tt(U, ■•• > Z)10 


X 


t(U, •••,Z)110 


Y 


tt(U, •••,Z)1110 


Z 


ir(U, -•, Z)llll 



codes that there will be no alphabetical encoding which uses the same 
length codes for every letter. To circumvent this difficulty we use Lemma 
2, first on L(P,Q) and next on M, and conclude that L(P,Q) must form 
a prefix set with O or R and M must form a prefix set with L(J,K,L) or 
N. There are then four new alphabets to consider, and we have con- 
structed Huffman encodings for each one. The one with smallest cost 
is the one in which J,K,L,M and P,Q,R were made into new letters. 
The numbers of digits for the letters in Table IV which this Huffman 
encoding required are listed. We next look for an alphabetical encoding 
in which the same numbers of digits is used. Such an encoding actually 



Table IV 



Letter 


Probability 


Number of Digits 


Space 


0.1859 


2 


A 


0.0642 


4 


L(B,C) 


0.0345 


5 


D 


0.0317 


5 


E 


0.1031 


4 


L(F,G) 


0.0360 


5 


H 


0.0467 


5 


I 


0.0575 


4 


L(J,K,L) 


0.0378 


5 


M 


0.0198 


5 


N 


0.0574 


4 





0.0632 


4 


L(P,Q) 


0.0160 


5 


R 


0.0484 


5 


S 


0.0514 


4 


T 


0.0796 


4 


L(U, ••• ,Z) 


0.0668 


4 
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exists, and so we obtain the best alphabetical encoding shown in Table 
I. It must be admitted that we were somewhat lucky to be able to reduce 
the problem to one in which one of the best possible encodings, disre- 
garding alphabetical order, includes an alphabetical encoding. Undoubt- 
edly, minor changes in the probabilities in Table I might make the prob- 
lem much harder. In the next section we give an encoding method which 
will apply in all cases. 

V. THE GENERAL ALPHABETIZING ALGORITHM 

The method which will be used in general builds up the best alpha- 
betical encoding for the entire alphabet by first making best alphabetical 
encodings for certain subalphabets. In particular, the subalphabets 
which will l)e considered will be only those which might form a prefix 
set in some alphabetical binary encoding of the whole alphabet. Since 
only those sets of letters consisting exactly of all those letters which lie 
between some pair of letters can serve as a prefix set, we will call such a 
set an allowable subalphabet. 

We will denote the allowable subalphabet consisting of all of those 
letters which follow L, in the alphabet (including L, itself) and which 
precede L, (again including Lj itself) by (L,- , L,). When referring to the 
ordinary English alphabet of Table I we will use the symbol # for the 
space symbol. Thus, ( # ,B) will be the subalphabet containing the three 
symbols space, A and B, and (A,A) will be used to denote the subalpha- 
bet containing only the letter A. 

If it were desired to find an optimum encoding satisfying certain kinds 
of restrictions other than the alphabetical one, different allowable sub- 
alphabets could be used, with the rest of the algorithm remaining analo- 
gous. This method of building up an encoding by combining encodings 
for subalphabets is analogous to the method used by Huffman, 1 except 
that he was able to organize his algorithms such that no subalphabets 
were used except those which actually occurred as prefix sets in his 
final encoding. However, we consider all allowable subalphabets, in- 
cluding some which are not actually used as part of the final encoding. 

The term cost of an encoding has been used to refer to the average 
number of binary digits per letter of transmitted message, that is, 
V, p,Ni . Since, in the algorithm to be described, we will be construct- 
ing an encoding for each allowable subalphabet, we will also use the 
corresponding sum for each subalphabet, But, since the probabilities p, 
do not even add up to 1 for proper subalphabets, the sum £,- PiNi d °e« 
not correspond exactly to a cost of transmitting messages, and so the 
corresponding sum will be called a partial cost. 
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The algorithm to be described takes place in n stages, where n is the 
number of letters in the alphabet. At the frth stage, the best alphabetical 
binary encoding for each fc-letter allowable subalphabet will be con- 
structed and its partial cost will be computed. For k — \, each subalpha- 
bet of the form (L,- ,L,) will be encoded by the trivial encoding which 
encodes L, with the null sequence; it has cost 0, since the number of 
digits in the null sequence is zero. For k = 2, each subalphabet of the 
form (Li , L I+ i) will be encoded by letting the code for L, be and the 
code for Lt+i be 1. The partial cost of this encoding is p, + p,- +1 . In 
general, the fcth stage of the algorithm, in which it is desired to find the 
best alphabetical binary encoding for each subalphabet of the form 
(Li ,Li + k-i) and its partial cost, proceeds by making use of the codes 
and the partial costs computed in the previous stages. 

For each j between i -\- 1 and i + k — 1, we can define a binary 
alphabetical encoding as follows: Let d , C,-+i , . . . Cy-i be the codes for 
Li , Li+i , . . . Lj-i given by the (previously constructed) best alphabeti- 
cal encoding for (L,- ,L,_i), and let C,- , Cj+i , . . . ,C, + t_i be the codes for 
Lj , Lj+i , . . . , Li+k-i given by the (previously constructed) best alpha- 
betical encoding for (Lj , Li+k-i)- Then the new encoding for L, , L,- +i , 
. . . , Lj-\ , Lj , L j+ i , . . . , Li+k-i will be 0C- , OCt+i , . . . , 0C,_i , ICy , 
IC'j+i , . . . , lCi+k-i ■ Such an encoding can be denned for each j, and 
the encoding is exhaustive. It follows from Theorem 2 that the best 
encoding for this subalphabet is given by one of the k — 1 such encod- 
ings which can be obtained for the k — 1 different values of j. The 
partial cost of such an encoding made up out of two subencodings is the 
sum of the partial costs of the two subencodings plus p,- + p x+ i + . . . + 
Pi+h-i ■ To perform the algorithm it will not be necessary to construct 
all of these encodings, but only to compute enough to decide which one 
of the k — 1 different encodings has the lowest partial cost. This is 
done by taking the sums of each of the k — 1 pairs of partial costs of 
subencodings and constructing the best encoding only. 

After the kth stage of this algorithm has been completed for k — 1, 
2, . . . , n, the final encoding obtained is the best alphabetical encoding 
for the entire original alphabet, and the final partial cost obtained is the 
cost of this best alphabetical encoding. 

If the above algorithm were performed on a digital computer, the 
length of time required to do the calculation would be proportional to 
n 3 . The innermost inductive loop of the computer program would per- 
form the operation mentioned above of computing sums of pairs of 
partial costs, and this would be done k — 1 times in the process of en- 
coding each one of the subalphabets considered in the fcth stage. But, 
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since there are n — (k — 1) different allowable subalphabets to be en- 
coded in the fcth stage, there are (k — l)[n — (k — 1)] steps to be done 
in the kth. stage. To find the total number of operations done in all of 
the stages, we sum, and find that 

±(k- l)[n- (fc - 1)] = (n * I" n) , 

*=i o 

which is an identity which can be verified by mathematical induction. 

VI. PROPERTIES OF EXHAUSTIVE ENCODINGS 

We have already shown (Theorem 3) that every best alphabetical 
encoding is exhaustive. Another reason for considering exhaustive en- 
codings to be of some general interest is given by the following theorem. 

Theorem 7: The Huffman binary encoding of any alphabet is exhaustive. 

Proof: We prove by induction that each of the encodings for prefix 
sets arrived at during the steps of the algorithm of Huffman 1 is an ex- 
haustive encoding. If this holds for the first k encodings constructed 
during this algorithm, consider the prefix set L encoded at the (k -+- l)th 
step. Let x = Xix 2 x 3 ... be any infinite sequence of binary digits. It 
suffices to show that there is some letter whose code is a prefix of x. 
The set L was made by combining two previous prefix sets of letters, 
L' and L", and it was encoded by prefixing the codes from their previous 
encodings by and 1 respectively. Let V be the set whose codes were 
prefixed by x x . Then if V is a single letter, xi is its code, and hence its 
code is a prefix of x. But if 1! is a prefix set, then its previous encoding 
is exhaustive by inductive hypothesis, and hence there is a letter \J" 
whose previous code is a prefix of x 2 x s .... Then the new code for L'" 
is a prefix of x. 

Several of the properties of exhaustive encodings will be considered, 
since both the Huffman encoding and the best alphabetical encoding 
are exhaustive, and it seems likely that exhaustive encodings might 
arise from other types of optimizing problems. For instance, the short- 
ening procedure used in Table II was essentially a way of making the 
encoding more nearly exhaustive. 

Lemma 8: Whenever an encoding C has the property that for any infinite 
sequence x = .T1.r2.T3 • • • there is a code of C which is a prefix of x, then 

E 2~ Ni > 1, (8) 

i = l 

and equality holds if and only if C has the prefix property. 
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Proof: Consider the set P of all finite sequences x having length exactly 
k, where k is some fixed integer longer than the longest code of C. Then 
the property assumed in the hypothesis implies that each element of P 
has at least one of the codes for a prefix. But P has exactly 2 fc elements, 
and for each code of length Ni there are 2 k ~ Ni elements of P of which 
it is a prefix. Hence, 

1=1 

which is equivalent to (8), and equality holds if and only if no element 
of P has two different codes for a prefix. However, the occurrence of 
two different codes which are prefixes of the same sequence is exactly 
equivalent to having one of the two codes be a prefix of the other. 

Theorem 8: Every exhaustive binary encoding has the prefix property and 
satisfies 

Jb2-"* = 1. (9) 

Proof: By Lemma 3 and the definition of exhaustive, (8) holds, but, 
by McMillan, 3 unique decipherability implies 

£ 2~ Ni g 1. (10) 

Then we combine (8) and (10) to obtain (9). But, by Lemma 3, this im- 
plies the prefix property. 

Lemma 4: For any exhaustive encoding of an alphabet, and any prefix 
* of this encoding, the new encoding of the prefix-set subalphabet which as- 
sociates the new code with each letter whose original code was $0 is an 
exhaustive encoding of this subalphabet. 

Proof: Given any x, to find a letter whose new code is a prefix of x we 
consider the letter L whose original code was a prefix of $x. Then, by 
the prefix property, the original code of L cannot be a prefix of <J>, and 
thus the original code of L is of the form 3>0. Hence, L is in the subalpha- 
bet, its new code is 0, and is a prefix of x. To complete the proof that 
the new encoding is exhaustive, note that it has the prefix property be- 
cause the original encoding does. Hence, the new encoding is either the 
trivial encoding (of a one-letter alphabet) or is uniquely decipherable. 

Lemma 5: For any exhaustive binary encoding of an alphabet having 
n letters, the total number of prefixes is 2n — 1. 

Lemma 6: In any exhaustive binary encoding of an alphabet having 
n letters, none of the codes consist of more than n — 1 digits. 
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Each of the last two lemmas associates a number with each exhaustive 
encoding, and they can be proved by induction on the number of letters 
in the alphabet. The number associated with each exhaustive encoding 
is represented in terms of the number associated with each of the two 
encodings that are constructed as described in Lemma 4 for the subalpha- 
bet having the prefix and the subalphabet having the prefix 1. 

Theorem 9: The cost of the Huffman encoding of an alphabet is a con- 
tinuous function of the probabilities of the letters. 

Theorem 10: The cost of the best alphabetical encoding of an alphabet is 
a continuous function of the probabilities of the letters. 

The last two theorems will be proved together, enclosing in parentheses 
the changes which convert the proof of Theorem 9 into a proof for Theo- 
rem 10. In fact, what will be proved are the slightly stronger theorems: 
For two alphabets A and A* having the same n letters, if p, is the prob- 
ability of the ith letter of A, pi* is the probability of theith letter of A*, 
and if k and A:* are the costs of the Huffman encoding (best alphabetical 
encoding) for A and A*, then 

| As _ fc*| ^ ( n _ i) j^lPi - Pi*\- (11) 

If we let B be the right member of inequality (11) and let k' be the 
cost of using the Huffman (best alphabetical) encoding of A* as an en- 
coding for A, then, by Lemma 6 and the definition of cost, we can con- 
clude that | k' — k* \ S B and, since from the definition of k we can 
conclude that k ^ k', we can combine these to obtain k* — k ^ B. 
By a similar argument involving the use of A:", the cost of using the Huff- 
man (best alphabetical) encoding of A as an encoding for A *, we obtain 
A: — k* ^ B. Combining these, we obtain (11). 

Theorem 11: The Huffman encoding for a given alphabet has a cost which 
is less than or equal to that of any uniquely decipherable encoding for that 
alphabet. 

Proof: This proof is essentially that of McMillan. 3 Let us consider any 
uniquely decipherable encoding C. We will construct a new encoding C" 
which has the same cost as C, and which has the prefix property. How- 
ever, by its method of contruction, the Huffman encoding has a cost 
which is less than or equal to that of any encoding having the prefix 
property, completing the proof of the theorem. Let Ni be the number of 
digits in the code which C associates with the ith letter of the alphabet. 
Let the letters of the alphabet be renumbered in such a way that Ni ^ 
Ni+i . Then, as in the encoding theorem (Theorem 1 of this paper, or 
Theorem 9 of Shannon, 8 we let 
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At = £ 2~»\ 

3=1 

and we define C to be the encoding which associates with the ith letter 
the code C{ obtained by truncating A* after Ni digits. Then it follows that 
the digits truncated were O's, and hence that each C/ agrees numerically 
with the corresponding At . By (10), each of the Ai is less than 1. To 
show that C has the prefix property, we assume that CY is a prefix of 
C/. Then * < j, by the renumbering. However, A i+X = A, • + 2~ N \ and 
hence A, ^ At + 2~ Ni . Thus, Aj cannot agree with the first Ni places 
of Ai . Hence, the first Ni digits of C/ are different from those of C/. 

Theorem 12: If A n is the number of exhaustive binary alphabetical en- 
codings for an alphabet having n letters, A\ = A 2 = 1, and for n ^ 3 we 
have 

4 „ = ^_3)!2 (12) 

(n — 2)! n! 

Theorem 18: If T n is the total number of exhaustive binary encodings for 
an alphabet having n letters, Ti = 1, T 2 = 2 and, for n ^ 3, we have 

(2n-3)12 n ,, 

r » - (n _ 2 )! ' {U) 

These theorems show how rapidly A n and T„ increase with increasing 
n. Since, by Theorem 3, A n would be the number of encodings to con- 
sider if it were desired to find the best alphabetical encoding by enumera- 
tion, Theorem 12 shows that the methods already given in this paper 
(even the general alphabetizing algorithm) are much faster than exhaus- 
tive enumeration. Similarly, Theorem 7 and Theorem 8 show how much 
slower exhaustive enumeration is than the algorithm given by Huffman. 1 

Each of the A„ alphabetical encodings may be converted into n\ of 
the T„ encodings by permuting its codes in all possible ways. It follows 
that T n = n\A n , and it suffices to prove Theorem 12. Consider for n ^ 2 
an exhaustive alphabetical encoding of n letters. Some number k = 
1, . . . ,n — 1 of these letters has a code with prefix 0. These k codes, 
each with its leading digit removed, have been shown (Lemma 4) to 
form one of the A k exhaustive alphabetical encodings of k letters. Simi- 
larly, the remaining n — k codes, minus their leading digits 1, form one of 
the A n -k exhaustive alphabetical encodings of n — k letters. Thus, if 
n ^ 2, 

n-l 
A n = J2 A k An-k , (14) 
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while Ai = 1. To solve (14), construct the generating function a(x) = 
Aix 4- A 2 x 2 + A 3 x 3 4- By (14), a(x) = x 4- a?{x); i.e., 

a(x) = J(l - Vl - 4.t). (15) 

The negative sign of the square root is needed to make a(0) = 0. The 
series for a(x) is obtained using the binomial theorem with power £. 
The coefficient of x n (which is A n ) has the expression (12). 

VII. ENCODINGS WITHOUT THE PREFIX PROPERTY 

So far in this paper very little has been said about encodings without 
the prefix property. For instance, we restricted the best alphabetical 
encoding to be the encoding having the lowest cost among all alphabetical 
order-preserving encodings having the prefix property. However, in view 
of the fact that the special encoding given in Table I is an alphabetical 
encoding and has cost 4.1801, it appears to be advantageous to dispense 
with the prefix property requirement. However, not very much is known 
about the properties of encodings lacking the prefix property, and, in 
fact, it is not known whether the special encoding given in Table I can 
be further improved or not. In fact, it was not constructed on the basis 
of any general procedure, but was found by a heuristic method. The next 
few paragraphs will give a few results which we have found about en- 
codings without the prefix property, but will also give some examples of 
the difficulties which it is possible to get into when using such encodings. 

It should be noted that a message which begins with the letter Y in 
the special encoding cannot be deciphered as soon as the Y has been 
received, but it is necessary to wait for further received digits in order to 
distinguish it from a Z. In particular, in the case of the message enci- 
phered as 11111101111110 it is necessary to wait for the 14th received 
binary digit before the first letter can be deciphered. 

In general, we will say that the delay of a presumed message is d if it 
is necessary to wait for the receipt of the first d binary digits before the 
first transmitted letter can be recognized. We will say that the delay of 
an encoding is d if d is the least upper bound of the delays of all pre- 
sumed messages of that encoding. We will say that an encoding has the 
finite delay property if the delay of that encoding is finite. For instance, 
the special encoding of Table I has the finite delay property, and in fact 
has delay 14. 

Theorem 14: If an encoding C has infinite delay, then there exists a pre- 
sumed ?nessage of C which has infinite delay. 

Proof: Given an encoding C with infinite delay, there exists an infinite 
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sequence of presumed messages M\ , M 2 , M 3 , . . . such that Mi has 
delay at least i. Then either the set of those presumed messages Mi whose 
first binary digit is or the set whose first binary digit is 1 is an infinite 
set. We thus can choose an infinite subsequence of presumed messages 
Mi , Mi , M 8 , . . . such that M , has delay at least i and such that all of 
the messages agree on the first binary digit. Proceeding by induction, 
we can choose at the fcth step a subsequence of presumed messages which 
all agree on the first k digits. Then the infinite presumed message whose 
fcth binary digit is the fcth binary digit of all presumed messages re- 
maining after the fcth inductive step is a presumed message, and has 
infinite delay. 

For an encoding to be useful in practice, it seems likely that it must 
have the finite delay property. This would permit a deciphering machine 
to be built having only a finite amount of memory, and it would permit 
two-way communication (as in telephony) to be almost instantaneous. 
However, in delayed communication systems (common in telegraphy) 
for which a tape is used for storing messages, this tape might be used to 
provide the unbounded amounts of memory needed to decipher an infi- 
nite delay encoding. 

To investigate further the problems of designing an optimal-cost en- 
coding of any sort (such as an alphabetical-order encoding), without 
requiring it to have the prefix property, it should be remarked that the 
problem is finite, but not necessarily easy to attack. That is, given an 
alphabet in which all of the letters have positive probability, and given a 
constant K, there are only a finite number of encodings of this alphabet 
which have a cost less than K. For if m is the smallest of the probabilities, 
there are not more than K/m digits in the longest code of any such en- 
coding, and there are only a finite number of encodings of an w-letter 
alphabet in which each code has length less than K/m. However, this 
number would be astronomically large for any alphabet of reasonable 

size. 

One particular way of generating encodings which will be used in a 
few examples below is of some general interest. The reversal of an en- 
coding C is a new encoding (which will be called C* for the remainder 
of this paper) which is obtained by letting the code for each letter be 
written in the reverse order. This interchanges the direction of increas- 
ing time, and changes many of the properties of the encoding, but it 
does preserve unique decipherability. 

Table V demonstrates many of the properties and complications of 
encodings, contrasting the one having the prefix property with three 
other encodings lacking this property. Each of the four encodings shown 
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Table V 



Letter 


Probability 


First Code 


Second Code 


Third Code 


Fourth Code 


A 
B 

C 
D 

E 


0.330 
0.005 
0.330 
0.005 
0.330 


000 

001 

01 

10 

11 


00 
001 
10 
101 

11 


00 

0011 

01 

0111 

10 


00 

00111 
01 
01111 

10 




Cost 


2.335 


2.01 


2.02 


2.03 



preserves alphabetical order, and each is uniquely decipherable. The 
first encoding has the prefix property, and in fact is the best alphabetical 
encoding in the sense used in this paper. However, it has an appreciably 
higher cost than either of the other three encodings, none of which has 
the prefix property. The reversals of each of the last three encodings have 
the prefix property, but the reversal of the first encoding does not. 

The second encoding of Table V has the lowest possible cost of any 
uniquely decipherable binary encoding by Theorem 11, since it is the 
reversal of a Huffman encoding. However, the second encoding has 
infinite delay, since the presumed message 001111 . . . has infinite delay. 
Furthermore, the second encoding, although it preserves the alphabetical 
order of individual letters, does not preserve the alphabetical order of 
words made up out of these letters. For instance, the enciphered form 
of CE is a larger binary number than the enciphered form of DA, al- 
though the latter occurs later in alphabetical order. The property of 
preserving alphabetical order of all words will be called the strong alpha- 
betical property, and it has already been shown that alphabetical en- 
codings having the prefix property have the strong alphabetical prop- 
erty. However, both the alphabetical encoding and the special encoding 
of Table I have the strong alphabetical property, and all of the en- 
codings of Table V except the second encoding have the strong alpha- 
betical property. There would be very little to be gained by employing 
an alphabetical order encoding for sorting or dictionary purposes unless 
it had the strong alphabetical property. 

The third encoding lacks these defects of the second encoding, but it 
has a special one of its own, about which more will be said in the next 
section. This defect has to do with synchronizing, and it can be explained 
in this case by the observation that every code of the third encoding 
has an even number of binary digits. Thus, if the deciphering circuit 
starts up while it is out of phase, it can never get back in phase. The two 
phases correspond to the odd-numbered and the even-numbered binary 
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digits, and the deciphering machine, if it is out of phase, would never 
get back in. In this case, where there are certain codes which cannot 
occur, the defect could be remedied by designing the circuit to addition- 
ally change phase if it ever receives a code 1011 or 1111, but this adds 
an extra complication to the circuit. However, the first and second 
encodings have the property that each of them will automatically get 
back in synchronism with probability 1, without the addition of any 
other codes or any other special features to the circuit. 

The fourth encoding has none of these defects, and since its cost is so 
near to the least possible, it would undoubtedly be a reasonably good 
choice as a solution, if this particular alphabet had arisen in an actual 
practical problem. 

So far in this paper, each example of an encoding with the finite delay 
property has had a delay equal to N m ax , where N max is the number of 
digits of the longest code of the encoding. This result does not hold in 
general, as is illustrated by Table VI. The fifth encoding has N maK = 6, 
but it has delay 8. 

Table VI 



Letter 


Fifth Code 


Sixth Code 


w 

X 
Y 
Z 


00 
001 
101 
110101 


00 
01 
10 

11 



The encodings having the finite delay property but not the prefix 
property, such as the special encoding of Table I and the fifth encoding 
of Table VI, provide counterexamples which contradict Remark II of 
Schiitzenberger (Ref. 5, page 55) and provide the example which is 
asked for in the sentence following Remark I of the same paper. 

As an alternative to the above method of expressing quantitatively 
the finite delay property, we may make the following definitions for use 
later in this paper. We will say that the excess delay of a presumed mes- 
sage is e if it is necessary to wait for the receipt of e binary digits beyond 
the end of the first transmitted letter of the presumed message before 
this first letter can be recognized. We will say that the excess delay of an 
encoding is e if e is the least upper bound of the delays of all presumed 
messages of the encoding. 

If d is the delay of an encoding, e is its excess delay, and N min and iV niax 
are, respectively, the minimum and maximum numbers of digits of any 
codes of the encoding, then we obviously have e + N min ^ d ^ e + 
iV max • Then an encoding has the finite delay property if and only if the 
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excess delay of that encoding is finite. Also, an encoding has the prefix 
property if and only if the excess delay of that encoding is 0. 

VIII. SELF-SYNCHRONIZING PROPERTIES 

Problems of how to make a transmitting device and a receiving device 
become and remain synchronized with each other are important in the 
engineering design of many kinds of systems. Since the encodings dis- 
cussed in this paper are variable-length, it might seem that the syn- 
chronizing problem for enciphering and deciphering circuits would be 
especially difficult. However, the synchronizing problem is very simple 
for many variable-length binary encodings, because of a particularly 
favorable property which they possess. These remarks can best be il- 
lustrated by an example. Suppose that (using the alphabetical encoding 
of Table I as an example) a message beginning 1110011110100111000 . . . 
is received, and we wish to observe how a deciphering circuit would 
decipher it. Since the encoding has the prefix property, the deciphering 
circuit should first find a code which is a prefix of this message, and then 
decode this to obtain the first letter T of this message. Proceeding with 















Table VII 










' 1 


T 
1 1 


" 
R 





1 


H 
1 


: A 
110 10 
T : 


o' 

M 


1 


T 
1 1 


: » : 
••• 
I : 



the remaining part, it then finds the letter H, and then the rest of the 
deciphered version shown in the first line of Table VII, where the sym- 
bol ":" is used to mark the divisions between those sequences of binary 
digits which were deciphered as individual letters. 

Next suppose that the same sequence of digits had been received, but 
that the deciphering circuit was not in synchronism with the enciphering 
circuit. In particular, suppose that, when the deciphering circuit was 
first turned on, it was in the state that it would be in if it were partly 
through the operation of deciphering some letter, and that the initial 1 
of the message was interpreted as the last digit of this letter. This de- 
ciphering is indicated on the third line of Table VII. Once again, the 
symbol ":" has been used to mark the divisions between letters. Then 
these two decipherings are out of phase (i.e., out of synchronism) with 
one another at the beginning of the message, but at the end of the re- 
ceived message they are in phase with each other, as is indicated by the 
fact that the " : " symbols align with each other at the right end of Table 
VII. This means that the deciphering circuit would have automatically 
become synchronized, without any special synchronizing circuits or 
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synchronizing pulses being necessary. It was, of course, necessary for at 
least two of the codes of the encoding to end in the same sequence of 
digits, but this is very likely to happen for any variable-length encoding, 
unless special efforts are made to prevent it. 

However, if we had been using a fixed-length encoding, such as the 
sixth encoding of Table VI, in which all of the codes have a fixed length 
k, there would be exactly k different phases in which the deciphering 
circuit might find itself, and the circuit could never make a transition 
between them. No pair of different codes can end in exactly the same 
sequence of digits, and so no two of these phases can become synchro- 
nized. Each of these phases will have all of the codes ending after j 
digit times, and after k + j, 2k + j, etc., where j is the remainder ob- 
tained on dividing the position of the symbol ":" by k, and hence j 
can take on k different possible values. 

Also, even in the case of variable-length encodings, if all of the code 
lengths are divisible by some integer k, then there will be at least k 
different phases. For if the position of one occurrence of the symbol ":" 
has remainder./ when divided by k, the position of all other occurrences 
of the symbol ":" in this phase of decipherment will have the same re- 
mainder. 

The above remarks apply strictly to exhaustive encodings, but may 
not apply where there are certain sequences of digits which can never 
occur. For if such a sequence of digits does occur, this may be used by 
the circuit as a special indication that it is out of phase, and hence it 
may be possible to build auxiliary circuits which can cause resynchroni- 
zation, even when a fixed-length encoding is used. So a more complete 
treatment of synchronization would allow such auxiliary circuits, but 
here we will consider only self -synchronization, which is carried out 
inherently by the same means as is used for deciphering. 

To speak more precisely about the self-syncrhonizing properties, we 
will make some definitions. Given any encoding C and any 

finite sequences x and y such that x is not the 

enciphered form (with respect to encoding C) (16) 

of any message, and xij is a presumed message, 

if 2 is a finite sequence of binary digits such that both xyz and yz are 
complete enciphered messages, we will say that z is a synchronizing se- 
quence for x and y. As an example, we have seen in Table VII that 
011110100111000 is a synchronizing sequence for 1 and 110. 

Given any uniquely decipherable encoding C which has some codes 
of length more than 1, exactly one of the three statements given below 
will hold: 
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i. For all (16), there is no z such that z is a synchronizing sequence 
for x and y. The encoding C will then be said to be never-self-synchroniz- 
ing. 

ii. For each (16), there is a z which is a synchronizing sequence for x 

and y. The encoding C will then be said to be completely self-synchronizing. 

iii. For some (16), there is a synchronizing sequence for x and y, 

but for other (16), there is no synchronizing sequence for x and y. The 

encoding C will then be said to be partially self -synchronizing. 

Furthermore, we will define a sequence z to be a universal synchronizing 
sequence for the encoding C if, for all (16), this same sequence z is a 
synchronizing sequence for x and y. 

Theorem 15: Given an exhaustive encoding C, then C is completely self- 
synchronizing if and only if there exists a z which is a universal synchroniz- 
ing sequence for C. 

Proof: A universal synchronizing sequence clearly satisfies the condi- 
tions of the definition of completely self-synchronizing, so it remains 
only to construct a universal synchronizing sequence, given that there 
is a synchronizing sequence for each finite sequences x and y. By the 
exhaustive property, there is a code consisting entirely of O's. We will 
assume that there are k O's in this code. We will construct our z by 
starting with N ma * O's, where iVmax is the length of the longest code of 
C; after this, there are only k different phases in which the circuit could 
be. Then we find a synchronizing sequence for two of these phases (for 
instance, a synchronizing sequence for 00 and 0), and put this next after 
our sequence. Next we put on the sequence of N max O's again. There are 
now at most k — 1 phases to synchronize, and, adding on sequences for 
these one at a time, we eventually construct our desired universal 
synchronizing sequence. 

The alphabetical encoding of Table I can be shown by Theorem 15 
to be completely self -synchronizing, since the sequence 010001011 is a 
universal synchronizing sequence for this encoding. The message AD 
has this sequence as its enciphered form. In addition, there are many 
other short universal synchronizing sequences for this encoding, such 
as the enciphered forms of #Y, AY, BD, BY, EY, HI, ID, JO, JU, 
MW, NY, OW, PO, PU, TY, etc. Since just these digraphs listed here 
occur as about three per cent of all digraphs in connected English text, 9 
it can be seen that, if English text were transmitted by use of this 
encoding, it would be quite likely to synchronize itself very quickly. 

In fact, it is easy to see that any exhaustive encoding which is com- 
pletely self-synchronizing will synchronize itself with probability 1 if 
the messages sent have the successive letters independently chosen with 
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any given set of probabilities, assuming only that all of these probabilities 
are positive numbers. This will occur since the probability of a universal 
synchronizing sequence occurring at any given time is positive, and, if 
we wait long enough, this will have happened with probability 1. 

The fact that this occurs with probability 1 does not make it quite 
certain to occur, and, in fact, it is possible to choose arbitrarily long 
sequences of English words which do not contain a universal synchroniz- 
ing sequence. An example of such a sequence for the alphabetical encod- 
ing of Table I is 

CHECK # SYNCHRONISM jff OF jff LONG # FILTHY 

# CHUCKLE % HEH % HEH % HEH % HEH 

But such a sequence is extremely unlikely to continue indefinitely in any 
practical communication system or record-keeping system. Also, slight 
complications of the encoding could permit certain sequences which are 
certain to occur in English text (such as a period followed by a space 
symbol) to be universal synchronizing sequences. 

One quality which might be worth comparing for various proposed 
encodings under consideration for possible use might be the average 
speed with which they synchronize themselves, when carrying typical 
traffic. This speed could be calculated from a sufficiently good knowledge 
of the statistics of the traffic, but it could more easily be measured 
experimentally, either by the use of actual enciphering and deciphering 
circuits, or by simulating their behavior on a digital computer. 

The synchronization problem occurs not only when the equipment 
is first turned on, but also in transmission systems for which there is a 
noisy channel. For if some digits of a message encoded in a variable- 
length encoding are changed, the change may cause the circuits to get 
out of synchronism by the change of a short code into the prefix of a 
long one, or vice versa. Also, of course, temporary malfunctions of the 
enciphering or deciphering circuit themselves might cause them to get 
out of phase. 

It may be of interest to enumerate the known results about combina- 
tions of synchronizing properties and lengths of the codes of exhaustive 
encodings. 

If an exhaustive encoding has a fixed length (all codes having length 
the same integer k), then it must be 

never-self -synchronizing. (17) 

If an exhaustive encoding has all the lengths of its codes divisible by 
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some integer k > 1, but these lengths are not all equal to k, then it 
must be one of the following : 

never-self -synchronizing, (18) 

partially self-synchronizing. (19) 

If an exhaustive encoding has the greatest common divisor of the 
lengths of its codes equal to 1, then it must be one of the following: 

completely self-synchronizing, (20) 

partially self-synchronizing, (21) 

never-self-synchronizing. (22) 

Of the above six cases, (17), (19) and (20) occur very much more 
commonly than the others. In fact, it is very difficult to construct 
examples of the other three, unless you deliberately set out to do so. 
The following theorems will give indications of the fact that cases (18) 
and (22) are hard to obtain. 

Theorem 16: Given an exhaustive encoding which is never-self -synchroniz- 
ing, if we let 

Q=J2 Ni2~ Ni , (23) 

then Q will always be an integer. 

It can be seen that, in the case of a fixed-length code, Q will be the 
length. However, no one of the exhaustive encodings (except those 
having fixed length) listed so far in this paper has an integer value for 
Q. Rather than give the full details of a rigorous proof of Theorem 16, 
only the main ideas involved will be explained. The sum Q is the average 
length of the codes obtained by deciphering a presumed message, if the 
presumed message was obtained by choosing 0's and l's as successive 
digits by independent choices having probability one-half. If we put 
such a random presumed message into the deciphering circuit, we have 
several different phases in which it may be deciphered. By the never- 
self-synchronizing property no two of these phases can ever come to- 
gether. 

Let H be the set of all prefixes of the presumed message. Then two 
of these prefixes will be said to be of the same phase if they are of the 
form 6 and 04>, where 4> is the enciphered form of a complete message. 
The set H is subdivided by the equivalence relation "being of the same 
phase" into B distinct sets, where B is the number of phases. By sym- 
metry, the probability that any two given members of H will be of the 
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same phase is equal, and, since each phase occurs with equal probability 
and the sum of all of them is 1, each phase occurs with probability l/B, 
where B is the number of phases. However, Q was the expected difference 
in length between a given member of H and its next longer member; 
hence, we will have Q = B. 

Theorem 17: Given an exhaustive encoding C, C is never-self-synchroniz- 
ing if and only if its reversal C* has the prefix property. 

Suppose that C is not never-self-synchronizing. By the definition of 
synchronizing sequence, there exist finite sequences x, y and z such that 
X is not the enciphered form of a message, but yz is the enciphered form 
of message m\ and xyz is the enciphered form of message m 2 . 

For some values of n the last n letters of wii may agree with the last 
n letters of m* . But, by the fact that x is not the enciphered form of a 
message, there is a largest value of n for which this is true. Let this 
largest value be n', and let the letters which are n' + 1 from the end of 
w x and wi 2 , respectively, be called Li and L 2 . Then €{L X ) and C(L 2 ) are 
both suffixes of the same message (the previous part of xyz), and hence 
the reversed form of one of them is a prefix of the reversed form of the 
other. 

The converse follows more readily, since, if 8 and 0<p are both codes 
of C*, then the reversed form of 6 is a synchronizing sequence for the 
reversed form of $ and the null sequence. 

To return to the problem of which of cases (17) through (22) can occur, 
it can easily be shown by the use of Theorems 16 and 17 that, among 
all exhaustive encodings in which not all codes are of the same length, 
the only ones which are never-self-synchronizing and have fewer than 
16 letters in their alphabet are the encoding which encodes a nine-letter 
alphabet by using the list of codes (000, 0010, 0011, 01, 100, 1010, 1011, 
110, 111), and the reversal of this encoding. This encoding is due to 
Schutzenberger. 6 

This provides an example showing that case (22) can occur. That 
(21) can occur is shown by an encoding (derived from the above by 
composition) using the list of codes (000000, 0000010, 0000011, 00001, 
000100, 0001010, 0001011, 000110, 000111, 0010, 0011, 01, 100, 1010, 
1011, 110, 111). 

It is also possible to construct an example of case (18), but the one 
we have found is too complicated to be worth presenting here. 

IX. ONE REALIZATION FOR ENCIPHERING AND DECIPHERING CIRCUITS 

Some reluctance to use variable-length encodings has been based on 
the opinion 1011 that it is hard to build circuits to encipher or decipher 
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DIRECTION OF INPUT 



Fig. 1 — Block diagram of enciphering circuit. 



them. Descriptions will be given below for one circuit for doing each 
of these, using principally just a shift register and a combinational 
translating circuit. Since using any code requires having a combinational 
translating circuit, and since presumably most devices using coded 
alphabetical information are likely to cause it to pass through a shift 
register, the kind of circuit described below would add very little com- 
plexity to such machines, and would automatically give them the self- 
synchronizing property, in the case of most variable-length binary 
encodings. 

The enciphering circuit, shown in Fig. 1, contains a shift register 
containing the words "HAS JUST BEEN ENCIPHERED" followed 
by a binary digit 1 and a string of zeros as long as the longest code which 
can occur in the variable-length encoding. We will assume that it is in 
such a state as to have the zeros as shown, although it can easily be seen 
that it will get into this state if it starts in any other condition. 

The circuit of Fig. 1 also contains an input reader (which can for 
concreteness be thought of as a punched paper tape reader, although it 
could be a buffer or other input device), which can read in one letter 
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at a time whenever it is given a pulse on the lead labelled "to advance 
input". 

The recognition circuit, which consists of a multiple-input OR circuit 
followed by a negation circuit, gives an output whenever there are as 
many binary zeros present as there are in the illustration. This sends a 
signal to enable the gate, letting the code corresponding to the next 
letter be read into the locations previously occupied by the 1 and all of 
the zeros. However, the translating circuit, which translates the letters 
into this encoding, instead of being designed to give directly the original 
variable-length encoding, gives an encoding which differs from it by 
having an extra "1" added to the end of each code. The output of the 
recognition circuit also goes to advance the input, reading in the next 
letter to be converted, after passing through a delay sufficient to be 
sure that the gate is now no longer enabled. This delay prevents the 
letter being translated from changing while it is being gated into the 
output shift register. 

As soon as the new code has been read into the shift register, it begins 
to be shifted along to the left in Fig. 1. The 1 at the end of the code 
serves to mark the end of the code during this shifting, but it will be 
eliminated from the enciphered form of the message. The shift register 
is connected so that, when it is shifted, a appears at the right end. 
As soon as the 1 passes beyond the end of the recognition circuit, there 
will be only zeros present, and hence the recognition circuit will again 
recognize the end of a letter and repeat the cycle as given above. 

Instead of having a counter or a special sequential circuit to keep 
track of where the current letter ends, this has been done here by add- 
ing a single binary digit to the code and adding one to the length 
of the required shift register. 

Similarly, an analogous scheme can be used to decipher from a variable 
length code into any other representation for letters, by using one special 
position in the shift register, as shown in Fig. 2. This deciphering circuit 
can be built only for encodings having the finite delay property, although 
the enciphering circuit of Fig. 1 can be used for any binary encoding. 

The shift register into which the digits to be deciphered are shifted 
is divided into two halves, which will be called the left half and the 
right half. The right half has e digit positions, where e is the excess delay 
of the encoding. The left half has A 7 ",™* + 1 digit positions, with the 
extra 1 being used to mark the end of those digits which already have 
been deciphered. 

At the beginning of the cycle we will assume that the left half of the 
shift register has just been cleared to the state shown in Fig. 2, that is, 
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to contain N max O's followed by a 1. Next, the digits of the message to 
be enciphered shift toward the left. Since the 1 precedes them, it marks 
clearly how many of these digits have been shifted into the left half. 
As soon as all of the digits of the code of the first letter of the message 
have been shifted into the left half, the translating circuit will then 
give its outputs. It gives the translated codes for the letter, as well as 
giving another output, w, which equals 1 only when the complete first 
letter is present. The translating circuit makes use of the inputs from 
only the left half of the shift register, ignoring the digits in the right 
half, unless the code C present in the left half is a code which is also a 
prefix of another code. It makes use only of those digits from the right 
half which are necessary to distinguish between this code and the par- 
tially shift ed-in code of which it is a prefix. It gives the output w = 1 
whenever the entire code for the first letter of the enciphered message 
has been shifted over into the left half and, whenever only a prefix of 
the code of the first letter is there, the output w will equal zero. 

This output w will then cause the left half to clear back to its original 
state, and, after a delay sufficient to allow the output to be received, it 
gives the "to advance output" signal to the output punch or buffer. 
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Fig. 2 — Block diagram of deciphering circuit. 
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The deciphering circuit then repeats the above cycle for the next letter 
of the message. 

The translating circuit of this deciphering circuit must give the ap- 
propriate outputs whenever the complete code for the first letter is 
present in the left half of the shift register, and must give w = 1 in these 
cases. It must also be designed to give the output w = whenever an 
incomplete prefix of the first letter is present, but, since in general there 
may be many states of the shift register which do not correspond to 
either a letter or a prefix, there may be many "don't cares" occurring 
in the design of this translating circuit, which will permit it to be simpler 
than a completely specified function having this many inputs. 

The time delay between the receipt of the beginning of an JV-digit 
code for a letter and the actual sending of this letter to the output punch 
or buffer will be N + e, which may sometimes be slightly longer than 
the delay d of the message. However, the circuit for doing the deciphering 
in the minimum time would be more complicated, in that it would not 
always clear the shift register to the same state, so it is not presented 

here. 

However, in the enciphering circuit given in Fig. 1 there is only a 
delay of one digit time, while the message is shifted through the one 
extra stage tit the left end of the shift register. Hence, neither of these 
two circuits operates in quite the minimum possible time, since speed 
has been sacrificed for simplicity of construction. 

X. FURTHER PROBLEMS 

There are many further problems suggested by the ideas discussed 
in this paper, and which we have not been able to solve. Are there any 
binary encodings which satisfy (9) other than the exhaustive encodings 
and their reversals? Are there any encodings C which satisfy (9) and 
such that both C and its reversal C* have the finite delay property 
without both C and C* having the prefix property? Given an encoding 
which is uniquely decipherable but which does not possess the finite 
delay property, does the set of presumed messages having infinite delay 
always form a finite set? Does it always form a set of measure zero? Is 
there a simple polynomial in N max and n which will be an upper bound 
to the delay of any encoding having the finite delay property? Are the 
encodings for which the algorithm of Sardinas and Patterson 2 fails to 
terminate precisely the same as the encodings having infinite delay? 
Given any encoding having infinite delay, is there a Turing machine 
(perhaps having several tapes and several reading and writing heads on 
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each) which can decipher any ZC-digit message in a length of time which 
is less than a constant times K? 
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